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Abstract. We consider the relation between the microscopic and effective 
descriptions of the unfolding experiment on a model polypeptide. We evaluate the 
probability distribution function of the performed work by Monte Carlo simulations 
and compare it with that obtained by evaluating the work distribution generating 
\^ , function on an effective Brownian motion model tailored to reproduce exactly 

s ! ' the equilibrium properties. The agreement is satisfactory for fast protocols, but 

deteriorates for slower ones, hinting at the existence of processes on several time scales 
even in such a simple system. 
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1. Introduction 

By means of atomic force microscopes and optical tweezers, several experimental groups 
have been able to control very precisely the force applied on proteins and nucleic 
acids. The observation of the unfolding behavior of these molecules under mechanical 
stress represents a powerful tool to recover structural properties of proteins and nucleic 
acids [1-9]. 

Furthermore, in the case of small biomolecules, unfolding experiments represent 
an excellent test bed for a class of recently derived results, which go under the 
name of fluctuation relations [11-13]. These relations connect the energy exchanged 
by thermodynamical systems with their environment to their equilibrium properties, 
and represent therefore an intriguing bridge between equilibrium and nonequilibrium. 
The case of small biomolecules under external force is interesting in itself, since the 
fluctuations of the energy exchanged by these systems are of the order of their average 
thermal energy, and they are therefore an example of microscopic out-of-equilibrium 
systems, with very large thermal fluctuations, whose study is one of the current topical 
problem in statistical mechanics [14]. 

As a force is applied on a biomolecule, and it progressively unfolds, the external 
pulling device performs thermodynamical work on it. By sampling this quantity 
over many repetitions of the unfolding experiments, and taking advantage of suitable 
fluctuations relations [11-13], it has been possible to estimate experimentally the 
free energy difference between the folded and the unfolded state of a simple RNA 
hairpin [10] and the free energy landscape of some proteins as a function of the molecular 
elongation [15,16]. Convergence in such estimate is dominated by the so-called outliers, 
i.e., rare values of the work that are much smaller than the average. The interest in 
the study of distribution functions of work performed on biomolecules during unzipping 
experiments and in particular of the distribution tails, is due to the need to estimate 
the frequency of the rare events that ensure validity of the fluctuation relations. 

The evaluation of the work probability distribution function (PDF) for a 
manipulated system requires in principle the solution of an evolution equation of 
complexity equivalent to the equation for the microscopic dynamics. Since this equation 
involves a large number of degrees of freedom even for comparatively small systems 
like a polypeptide, it is customary to describe its dynamics by a small set of collective 
coordinates undergoing a Brownian diffusion process [13,17-20]. It is worth to note that 
in their works, Hummer et al. [20] showed that if the molecular unfolding is described 
as a one dimensional diffusion process along a structured energy potential, the force- 
induced rupture rate exhibits a behaviour which is much more complex than the widely 
used approach based on the Bell's formula. Here we investigate the relation between 
the two levels of description on a simple model of a polypeptide unfolding experiment, 
and differently from [20], we focus on the description of the work distribution, rather 
than on the description of the unfolding rate. 
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2. The work distribution 

Let us consider a system whose microscopic state is identified by the variable x, where x 
can also indicate a collection of microscopic coordinates, e.g., the positions and momenta 
of the particles which make up the system. We shall assume that the system evolves 
according to a general dynamic process, parametrized by a parameter //, which can be 
manipulated according to a fixed protocol fi(t). The evolution can be deterministic or 
stochastic, but we shall assume that, for any given value of fj,, there is a well-defined 
equilibrium distribution that can be represented in the Boltzmann-Gibbs form 

W = —7—> C 1 ) 

a relation which defines the Hamiltonian H(x,fi) and the partition function = 
J dx e-P H \ x 'V) , Here (3 = 1/ksT, where T is the absolute temperature and ks 
Boltzmann's constant. Thus H(x,fi) depends explicitly on the time only via n(t). 
The probability distribution function (PDF) P(x, t) of the microscopic state x evolves 
according to the Liouville-like partial differential equation 

d t P{x,t) = £ fl p{x,t), (2) 

where £ M is a linear differential operator, compatible with the equilibrium distribution 
of the system for any fixed value of \i: C^P^ = 0, Wfi. 

The external manipulation of the system via /i leads to an energy exchange with 
the environment. According with the usual conventions in statistical mechanics (see, 
e.g., [21]), the fluctuating work W performed on the system, given the manipulation 
protocol n(t) and the microscopic trajectory x(t), is given by 

W= fdt' fi(t')d^H(x,fi)\ x(tl)Mt/) . (3) 

J 

Under these hypotheses, the time evolution of the joint PDF 4>{x, W, t) of the microscopic 
state x and the total work W performed on the system is governed by the partial 
differential equation [17, 18] 

d t (f){x,W,t) =C^{x,W,t) - fi{t) d^Hix, n% {t) d W (f){x,W,t). (4) 

In order to simplify the analysis, one evaluates the generating function i/j(x, X,t) of the 
distribution of W, defined by 

ip(x,\,t) = J dWe iXW (f)(x } W,t), (5) 

so that eq. (jlj) becomes 

d t ^{x,X,t) = £^{x,\,t)+i\d t Hi(){x,\,t). (6) 

This equation can be solved explicitly if the system is characterized by discrete states: in 
ref. [22], e.g., an RNA hairpin was modelled as a three-state system and the PDF of the 
work, done on the molecule by an external mechanical force, was evaluated numerically. 

One can evaluate the solution of eqs. ([5H6]) for real A, starting from the initial 
condition ip(x, A, 0) = P eq (0), VA. Thus, since <p(x, W, t) is real, we have ip(x, — A, t) = 
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if>*(x, X,t), and we can restrict ourselves to the half-line A > 0. Then one can separate 
eqs. ([6]) into two equations, one for the real part ip R and one for the imaginary part ipi 
of if), obtaining 

d t ip R (x,X,t) = Cf,ip R (x,X,t) - Xd t Hipi(x, X,t), (7) 
d t ipi(x, X, t) = C^i(x,X,t) + Xd t Hif) R {x,X,t). (8) 

Once the function if)(x, A, t) has been obtained, the joint PDF <f>(x, W, t) is given by 
the inverse Fourier transform of ip(x,X,t). The unconstrained work PDF can be then 
evaluated from the relation $(W,t) = J dx<fi(x, W, t). 

An important special case obtains when one considers a system described by few 
degrees of freedom, which is in contact with a large heat reservoir. In this case, 
it is often warranted to assume that the microscopic state x performs a Brownian 
motion [13, 17-20], and thus the operator £ M has the form of a Fokker-Planck (FP) 
differential operator: 



^ dx 



d 

d x H(x, /i) ■ + k sT— 



(9) 



where we take into account the Einstein relation between the diffusion and the kinetic 
(mobility) coefficients D = YksT. 



3. Collective coordinates 



In an unfolding experiment, where a mechanical force is applied to one or both the 
free ends of a biopolymer, the work can be sampled by monitoring the extension of 
the molecules at different times [10, 15, 16]. In this situation we face the following 
problem. Equations ([2]), (T4J) and (J6]) describe the dynamics of a system at a very 
detailed, microscopic level. On the other hand, the behavior of the system is accessed 
only via the measurement of a few, and most often only one, observables, such as 
the elongation. Moreover, the microscopic "Liouville" operator is not generally known 
with sufficient confidence. In any case, the explicit solution of the evolution equations 
becomes unfeasible as soon as more than a few degrees of freedom have to be considered. 

Thus one considers descriptions of the system through some experimentally 
accessible collective coordinates. In the case of biopolymers, one typically chooses the 
end-to-end length L. Its equilibrium distribution is determined by the effective free 
energy, defined by 

F(L, /i) = -/T 1 In J dx 5{L{x) - L) e ^ H ^l (10) 

which plays the role of an effective hamiltonian. The dependence of this free energy on 
L is the target of several experimental studies, performed by using a suitable fluctuation 
relation [15, 16]. 

It should be possible in principle to obtain the evolution for the collective coordinate 
PDF, and thus the work distribution, by projecting the microscopic "Liouville" 
equations on the space spanned by the collective coordinates [23]. However, one 
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would then in general obtain complicated non-Markovian evolution equations, whose 
parameters will depend on unknown details of the underlying microscopic dynamics. 
In other words, even if eqs. (J2H6]) were exact, it would be hard to derive the explicit 
equations governing the time evolution of the PDF P(L, t) and the joint PDF (f)(L, W, t). 

Thus, in the present work, we make the following ansatz: we assume that the 
coordinate L performs a Brownian motion in an effective potential, which is given by 
the free energy landscape (fTUl) . This implies that the evolution operator for the 
coordinate L is a FP operator of the form of eq. ([9]), where x has to be replaced by L, 
and H (x, //) has to be replaced by F(L, /i), as given by eq. ({TO]) . This is a bold summary 
of the underlying microscopic process. In the resulting model, the form of the evolution 
equation is constrained, but the value of the kinetic coefficient T is still free. We shall 
take advantage of this degree of freedom and check whether it allows us to describe the 
behavior of the work PDF with sufficient fidelity. 

4. Lattice model of proteins 

In the present section we consider a lattice model for proteins under mechanical load, 
that, in spite of its simplicity, is able to reproduce in great detail the outcome of 
experiments performed on real proteins [24-26]. In this model, the state of a N + 1 
aminoacid protein is defined by the set of discrete variables {m k }, k = 1 . . . N. These 
binary variables take the value m k = {m k = 1), if the peptide bond is in the non-native 
(native) configuration. Then the effective hamiltonian reads 

N-l n j 

H cS ({m k },LJ) = e v A v Ii m k ~ f ■ L({m k }, {o^}), (11) 

i=l j=i+l k=i 

where 

L({m k }, {o-ij}) = Uj(TijSij{m) (12) 

0<i<j<N+l 

is the end-to-end distance in the configuration x = ({m k }, {a^}), projected on the 
direction of the applied force, as shown in fig. [U In this hamiltonian the quantity 
€ij < represents the interaction energy between the residues i and j + 1, 1^ is the 
length of the native strand of peptide bonds between residues i and j, or the length of 
the single non-native bond + and the binary variable cry is equal to 1 if the strand 
is parallel, and to —1 if it is antiparallel to the applied force. The quantity Sij(m) is 
equal to 1 if the polypeptide strand starting at i and ending at j is all in the native state, 
and is flanked by bonds in the non-native state, and vanishes otherwise, as explained 
in [24, 25] . For a given protein, the parameters and Uj are obtained by analyzing the 
protein native structure, as given in the Protein Data Bank (PDB). 

Here, we consider in particular the polypeptide PIN1 (PDB code 1I6C) which is 
made up of 39 aminoacids, at a reduced temperature T = 6 (cf. ref. [25] for a detailed 
discussion on the temperature and force scales). 
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Figure 1. Cartoon of the model protein, with a force applied to one of the free ends. 
Dots denote amino acids and dashed lines denote contacts. 



The unfolding experiments are simulated using the Monte-Carlo Metropolis 
algorithm with the hamiltonian pip , where the external force varies linearly with time 
f = r ■ t. For each unfolding trajectory, the system is prepared in equilibrium with 
vanishing force, and then at t = 0, the force starts increasing with rate r. For practical 
purposes, we define the force rate as r = / max /t mM , keeping constant / max = 10, and 
varying t max . We have simulated 10000 unfoldings for each value of the rate r. We have 
then sampled the work W = Jg max d t H eS ({mk}, {o~ij}, f(t)) performed on the molecule 
and evaluated the work histograms. 

For the model protein here considered, the free energy landscape Fq(L), as defined 



can be exactly calculated [24, 25] , and the time- dependent landscape reads therefore 
F(L,f(t)) = F (L) — f(t)L. In eq. (fTBl) . x represents the microscopic state of the 
model, i.e., the collection of the variables {m^} representing the state of the bonds, and 
of the variables {cr^} representing the orientation of the strands with respect to the 
reference direction. 

The free energy landscape F(L,f) of this polypeptide in plotted in fig. ([2]), for 
different values of the external force. Inspection of this figure indicates that at vanishing 
external force, the potential is almost flat for L < 3 nm, while for larger values of the 
force a minimum appears at L* ~ 12.6 nm, whose position is practically independent 
of /, indicating that L* represents the length of the fully stretched molecule [25]. As 
discussed in section [3j we will take this energy landscape as effective potential in the 
differential operator ([9]). 

In figure H] we show the histograms of the work PDF, as obtained by the simulations 
discussed above, for four different values of the manipulation rate r = / max /t max . In 
the same figure, we plot the probability distribution function as obtained by solving a 
discretized version of the equations (J4HS]) . 

In this approach the equations take the form of a master equation, in which the 
states are identified by an integer i, where Lj is the polymer length measured in units 



by 




(13) 
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Figure 2. Free energy landscape F(L, /) for the model PIN1 polypeptide, for different 
value of the external force. The force is expressed in reduced units, see ref. [25]. 
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Figure 3. Real and imaginary parts of the unconstrained generating function 
^(A, i max ) = ^(A, imax) +i*i(A, t max ) vs. A, as obtained from the numerical solutions 
of eqs. ft©, with T = 13.75, and r = 1 (i max = 10). 



AL = L mSuX /N, where L max is the maximum length that can be obtained in the lattice 
model, and iV = 126. Positive and negative values of L are considered. The transition 

rates Wi > j±i are defined to match those of a Metropolis process with an attempt 

frequency equal to Y: 

'l, ii F^Jit)) < F{LiJ(t)); 

otherwise. 



W l ^ l(=i±1) (t) = Tx! \ {F{L .j it)) _ FiLiJm '. ' ' " " (14) 



The resulting equations can then be solved, by a classic Runge-Kutta method, 
when a definite value is assigned to the kinetic parameter T. One then evaluates 
the unconstrained generating functions ^r,i(A, t max ) = J dL ^ RiI (L, A, t max ). These 
functions are plotted in fig. El for T = 13.75, and r = 1 (t max = 10). As discussed 
in section [2], the unconstrained work PDF &(W, t max ) is finally obtained by inverting 
eq. ©. 
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For each value of r we consider different values of the kinetic coefficient T, and 
choose the value which most closely reproduces the simulated histograms. Indeed, even 
if the microscopic process goes on with a well-defined characteristic attempt frequency, 
this is not the case for the effective process described by the FP equation. In order 
for x to change, the microscopic variables {m^cr,.,} must change. Their rate of change 
will depend on an Arrhenius factor depending on the actual energy difference due to 
the change of the particular value one is looking at. This factor will depend on the 
instantaneous value of the applied force, as well as on the overall state of the chain, and 
will not be a function only of L. We find nevertheless that it is possible to identify a 
value of T which yields a reasonably good agreement for the faster protocols (r = 1). 
This value decreases as the manipulation speed decreases, showing that in the slower 
manipulations there is a larger frequency of microscopic processes that does not show 
up in changes of L. For slower protocols (r = 0.1, 0.01), while one can match the 
mean value of the calculated work PDF that obtained with simulations, the shape of 
the calculated distribution differs from the histograms. Apparently the intrinsic rate of 
the microscopic processes in these protocols cannot be represented by a single attempt 
frequency, while it can for the faster protocols, which are dominated by simple "snatching 
off" of native regions. For the slowest manipulations (r = 0.001) the work distribution 
becomes Gaussian. In this case, the Jarzynski identity [11,12] implies that its average 
Wo, its variance a^y and the free-energy change AF must be related by 

W = AF + ^. (15) 

Thus it is be possible to recover the distribution by fitting the single parameter T, as 
we can see from the curves for r = 0.001. 

In order to compare quantitatively the work PDFs as obtained by simulations and 
by solution of eqs. (EH6]), for each value of r we exploit the Kolmogorov-Smirnov test [27]. 
Thus, one evaluates the maximal distance D between the cumulative distributions of 
the two work PDFs: 

D = sup x \x cxp (W)-X thc °(W)\, (16) 

where x a (W) — f^oo ® a (W), a =exp/theo, and where & exp (W) is the histogram as 
obtained by simulations, and § theo (W) is the expected distribution, obtained with the 
procedure described in section [2j The quantity D is plotted in fig. [5] as a function of r. 
Inspection of this figure suggests that the smallest values of D are obtained for r = 1 
and r = 0.0001, as indicated by a qualitative analysis of fig. HI 



5. Discussion 



In this work we have investigated on a simple example the relation between the work 
PDF obtained for the same system via its microscopic dynamics and an effective 
Brownian dynamics. We found that the Brownian dynamics works reasonably well for 
the faster protocols, but is off the mark for slower ones, hinting at the existence of several 
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Figure 4. Work PDF for the model protein discussed in the text, for different values 
of the force rate r: r = 1 (a), r = 0.1 (b), r = 0.01 (c), r = 0.001 (d). The value of T 
shown in the legend of each figure, corresponds to the value used to solve numerically 
cqs. (JBJ). 




Figure 5. Kolmogorov-Smirnov distance D between the work distributions as 
obtained by simulations and by eqs. (J4HSJ) , as a function of the loading rate r. 



dynamical time scales in the relaxation of a moderately complex manipulated system. 
Thus, particular care has to be taken when comparing experimental outcomes with the 
results of numerical simulations, when the unfolding of a biopolymer is modelled as a 
biased one-dimensional Brownian process. 
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